Project 1 - Jerry Lai

Methodology

Score function

Results (before extra stuff)

Images

As seen above, the pyramid method works well for most of the images, however, in some of them, expecially ones with repeated patterns, it has found local minima.

Offsets (before extra stuff):

Offsets are of the format: ec_list_of_pyramid_offsets [[best green x, best green y, best green angle], [best red x, best red y, best red angle]]

Extra stuff

In this section I:

  1. Use gradients to find edges
  2. Use a tensor product score (which might work better with the gradients)
  3. clip the edges

Using gradients instead of matching colors

So far, we are trying to match them to minimize the difference between the arrays of each color. However, this would only work well with lots of contrast in brightness, and not a lot of saturation. In images where there is more vibrant colors, and less contrast in terms of brightness, the similarity between the color channels would mean nothing. This can be seen in the results of tobolsk, melon, self portrait, emir, and cathedral (in the case of cathedral, green and red were matched well to each other but not to blue.)

This is not a problem for humans, because although I can't speak for other humans, if I were to do this task by hand I would match the edges instead of the colors.

In this section I attempt to use magnitude of gradient to find the edges.

Code below. First I test the method in monastary, which is one of the jpg's that the original method didn't work well in.

This seems to work very well for this little jpg. Next, I will test it to see if

  1. It works on a big tif - I will try with melons, which the original method failed on.
  2. It works even after scaling down. In particular, it must work when scaling down 32x, which is the most that it will get scaled down in the pyramid search.

Clearly, these are very dark - even the one that wasn't deliberately scaled down was scaled automatically to fit the screen. However, the important thing is that the edges are still existent and distinguishable. Whether it's easy or hard to see with the naked eye shouldn't matter much to a machine.

Using tensor product score

This is another idea that is based on how humans (or at least me) align images. When we are aligning the edges, we don't care about most of the image - we only care that the lines match, and we pay no attention to the blackspace at all. Thus, an elementwise product is a better representation of how a human judges the quality of an alignment.

So I wrote tensor_product_score, which takes the sum of an elementwise product of the two arrays it is comparing.

Clipping the image

This clipping uses a simple algorithm of taking the max offset for each dimension, and clipping both sides for that dimension.

Also, I no longer check angles - In each of the original images, the best angle was 0, so there's really no point.

Results (with EC stuff)

Offsets (with EC stuff)

Offsets are of the format:

[[best green x, best green y, best green angle],
[best red x, best red y, best red angle]]

Summary

It appears that using edges, but with the SSD score, achieved the most consistent results. However, it still failed with Melons and self portrait - these are both images with lots of patterns of little things. However, this is the sort of image that would also confuse a human the most, so it is understandable.